Pipeline Object

We now have a pipeline object that is powerful and can act on a 'feature' object. This notebook is a display of the functionality


In [1]:
# Necessary imports 
import os
import time
from nbminer.notebook_miner import NotebookMiner
from nbminer.cells.cells import Cell
from nbminer.features.features import Features
from nbminer.stats.summary import Summary
from nbminer.stats.multiple_summary import MultipleSummary
from nbminer.encoders.ast_graph.ast_graph import *

In [2]:
people = os.listdir('../testbed/Final')
notebooks = []
for person in people:
    person = os.path.join('../testbed/Final', person)
    if os.path.isdir(person):
        direc = os.listdir(person)
        notebooks.extend([os.path.join(person, filename) for filename in direc if filename.endswith('.ipynb')])
notebook_objs = [NotebookMiner(file) for file in notebooks[:5]]
a = Features(notebook_objs)

In [3]:
from nbminer.pipeline.pipeline import Pipeline
from nbminer.preprocess.get_ast_features import GetASTFeatures
from nbminer.preprocess.resample_by_node import ResampleByNode
from nbminer.results.reconstruction_error.astor_error import AstorError
gastf = GetASTFeatures()
rbn = ResampleByNode()
agr = ASTGraphReducer(a, threshold=5, split_call=True)
ae = AstorError()
pipe = Pipeline([gastf, rbn, agr, ae])
pipe.transform(a)
print (ae.get_summary())


<nbminer.preprocess.get_ast_features.GetASTFeatures object at 0x11528f710>
<nbminer.preprocess.resample_by_node.ResampleByNode object at 0x11528f6a0>
<nbminer.encoders.ast_graph.ast_graph.ASTGraphReducer object at 0x11528f668>
<nbminer.results.reconstruction_error.astor_error.AstorError object at 0x11528f6d8>
The average length of the original strings is: 51.694545454545455
The average length of the reconstructed strings is: 17.10909090909091
The average edit distance is: 40.75818181818182
The average number of characters in common is: 13.171770297949617

Pipeline info

It works! What just happened is that we built a pipeline that has four steps, which computes the bottom up encoding and decoding and gets the astor error from the result.


In [ ]: